Utterance-based Selective Training for Cost-effective Task-adaptation of Acoustic Models

نویسندگان

  • Tobias Cincarek
  • Tomoki Toda
  • Hiroshi Saruwatari
  • Kiyohiro Shikano
چکیده

The construction of acoustic models for speech recognition systems is a very costly and time-consuming process, since their robust training requires large amounts of transcribed speech data, which have to be collected and labeled by humans. This paper describes an approach for costeffective construction of task-adapted acoustic models. Existing speech data(bases) are employed to set up a large training data pool. Apart from that, only a small amount of taskspecific speech data is required. Based on an algorithm for utterance-based selective training of acoustic models, training utterances are selected from the training data pool so that the likelihood of the acoustic model given the task-specific speech data is maximized. The proposed method is evaluated for acoustic models with context-independent and contextdependent phonetic units. Results are reported for building an infant (preschool children) acoustic model with speech from elementary school children and an elderly acoustic model with adult speech. The proposed approach is already effective if there are only 20 task-specific utterances available. A relative improvement in word accuracy of up to 10% is achieved over conventional acoustic model construction and up to 2.8% over MAP and MLLR adaptation with the task-specific data. The gap in performance to an acoustic model trained on large amounts of task-specific data was reduced up to 76%.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Acoustic modeling for spoken dialogue systems based on unsupervised utterance-based selective training

The construction of high-performance acoustic models for certain speech recognition tasks is very costly and time-consuming, since it most often requires the collection and transcription of large amounts of task-specific speech data. In this paper acoustic modeling for spoken dialogue systems based on unsupervised selective training is examined. The main idea is to select those training utteran...

متن کامل

Transcription Cost Reduction for Constructing Acoustic Models Using Acoustic Likelihood Selection Criteria

This paper describes a novel method for reducing the transcription effort in the construction of task-adapted acoustic models for a practical automatic speech recognition (ASR) system. We have to prepare actual data samples collected in the practical system and transcribe them for training the task-adapted acoustic models. However, transcribing utterances is a time-consuming and laborious proce...

متن کامل

Selective training of HMMs by using two-stage clustering

This paper proposes a method of constructing acoustic models from training data clustered in two stages. In the first stage, training data from a target task are clustered and generate GMMs for each cluster. The second stage uses the GMMs to select training data from a large-scale database based on the GMM likelihood. MAP estimation adapts an acoustic model for each cluster using the selected t...

متن کامل

On the limits of cluster-based acoustic modeling

This article reports a two-part study of structured acoustic modeling of speech. First, speaker-independent clustering of speech material was used as the basis for a practical cluster-based acoustic modeling. Each cluster’s training material is applied to the adaptation of baseline hidden Markov model (HMM) parameters for recognition purposes. Further, the training material of each cluster is a...

متن کامل

Selective MCE training strategy in Mandarin speech recognition

The use of discriminative training methods in speech recognition is a promising approach. The minimum classification error (MCE) based discriminative methods have been extensively studied and successfully applied to speech recognition [1][2][3], speaker recognition [4], and utterance verification [5][6]. Our goal is to modify the embedded string model based MCE algorithm to train a large number...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006